**Merging Teacher data and then measuring teacher quality*


clear all
use "C:\Users\X\RAdata_v1.dta"

gen newyear =.

 replace newyear = 2015 if year==2015 & grade==1
replace newyear=2015 if year==2016 & grade==2
replace newyear =2015 if year==2017 & grade==3

 replace newyear = 2016 if year==2016 & grade==1
 replace newyear = 2016 if year==2017 & grade==2
  replace newyear = 2016 if year==2018 & grade==3


 replace newyear = 2017 if year==2017 & grade==1
 replace newyear = 2017 if year==2018 & grade==2
  replace newyear = 2017 if year==2019 & grade==3

*Drop those not in original data
drop if newyear==.

drop year
gen year = newyear
drop newyear
save "C:\Users\X\RAdata_v2.dta", replace

*Next steps: Need to decide semester in originial data:

clear all
**Combine 2015, 2016 2017

use "C:\Users\X\2015v5.dta"
gen year=2015
append using "C:\Users\X\2016v5.dta", force
replace year =2016 if year==.
append using "C:\Users\X\2017v5.dta", force
replace year =2017 if year==.
*drop undidentified IDs
drop if unique_ID==99999
*genrate new uniques ID so IDs dont overlap in both datasetss
gen newID = unique_ID*year




gen semester=.
*Semester for those in 1st year high school
replace semester = 1 if Exam==1 | Exam==11 | Exam==12 & year==2015
replace semester = 2 if Exam==12 | Exam==13 | Exam==14 & year==2015

replace semester = 1 if Exam==1 | Exam==11 | Exam==12 | Exam==13 & year==2016
replace semester = 2 if Exam==14 | Exam==15 | Exam==16 & year==2016

replace semester = 1 if Exam==1 | Exam==11 | Exam==12 | Exam==13  | Exam==14 & year==2017
replace semester = 2 if Exam==15 | Exam==16 | Exam==17 & year==2017

*Semester for those in 2nd year high school
replace semester = 1 if Exam==21 | Exam==22 | Exam==23 & year==2015
replace semester = 2 if Exam==24 | Exam==25 | Exam==26 & year==2015

replace semester = 1 if Exam==21 | Exam==22 | Exam==23 & year==2016
replace semester = 2 if Exam==24 | Exam==25 | Exam==26 & year==2016

replace semester = 1 if Exam==21 | Exam==22 | Exam==23 & year==2017
replace semester = 2 if Exam==24 | Exam==25 | Exam==26 & year==2017


*Semester for those in 3rd year high school
replace semester = 1 if Exam==31 | Exam==32 | Exam==33 | Exam==34 & year==2015
replace semester = 2 if  Exam==35 | Exam==36 | Exam==37 | Exam==38 & year==2015

replace semester = 1 if Exam==31 | Exam==32 | Exam==33 | Exam==34 & year==2016
replace semester = 2 if  Exam==35 | Exam==36 | Exam==37 & year==2016

replace semester = 1 if Exam==31 | Exam==32 | Exam==33 | Exam==34 | Exam==35 & year==2017
replace semester = 2 if Exam==36 | Exam==37 & year==2017


*next we generate the year of high school (grade you are in)
gen grade=.
replace grade = 1 if Exam==1 | Exam==11 | Exam==12 | Exam==13  | Exam==14 | Exam==15 | Exam==16 | Exam==17
replace grade = 2 if Exam==21 | Exam==22 | Exam==23 |  Exam==24 | Exam==25 | Exam==26
replace grade = 3 if Exam==31 | Exam==32 | Exam==33 | Exam==34 | Exam==35 | Exam==36 | Exam==37 | Exam==38 

*Generate class
destring Class, gen (class) force

*Generate urban versus rural variable*


bys newID: egen urban2 = mean(urban)
replace urban2 = 0 if urban2==.
drop urban
ren urban2 urban



**Noteby using below, we wont have data for Exam = 0 (HET) or Exam ==99 (CET) in this same dataset
**pairwise joining of data
joinby using "C:\Users\pmougani\Dropbox\peng&serena&pierre\track\data\Qingyang First High School\Main Analysis for all three years\RAdata_v2.dta", unmatched(master)


sort newID Exam

*Generate running variable*
gen running = Total_mark if Exam==1
bys newID: egen running1 = mean(running)

*Year 2017 cutoff
gen normrunning = running1-425 if year==2017 
*year 2016 cutoff
replace normrunning = running1-467 if year==2016 
* Year 2015 cutoff 
replace normrunning = running1-480 if year==2015  


*Need to make changes to year 2017 top classroom which has coding error (top_class==2)
replace Top_class = 0 if Top_class==2 & year==2017


*Generate gender dummy varable
			destring Gender, gen(sex) force
*gen dummies for each year
tab year, gen(years)



*Generating high school entrance scores in main subjects (Math, English and Chinese) as control*
gen hsscore1 = Chinese + Math + English if Exam==0
*Below is to replace missing with whatever score we have
replace hsscore1 = Total_mark if Exam==0 & hsscore1==.
bys newID: egen hsscore = mean(hsscore1)

**Note to use first stage in first year of high school (since can be in different section in third year)
gen Top_classsec = Top_class if Exam==1
bys newID: egen Top_class1 = mean(Top_classsec)

*Variables for Regressions
  gen treatment = 1 if normrunning >=0 & normrunning!=.
replace treatment = 0 if normrunning <0
gen slope = normrunning*treatment


*Generate teacherrank variable  
  bys name: egen teacherrank = max(title02)

  **Generaete proportion of top teachers variable*
  gen topteach=1 if teacherrank==3
  replace topteach=0 if teacherrank==1 | teacherrank==2
  

*NEED TO GENERATE MATH GRADE FIRST before removing obs
		*A) MATH

	bys newID: egen firstyearMath = mean(Math) if Exam==12 | Exam==13 | Exam==14 | Exam==15 | Exam==16 | Exam==17 
	
		egen stmat1 = std(firstyearMath) if year==2015
		egen stmat2 = std(firstyearMath) if year==2016
		egen stmat3 = std(firstyearMath) if year==2017
		
		gen stmath = stmat1
		replace stmath = stmat2 if year==2016
		replace stmath = stmat3 if year==2017

		
		
	*A) Generate Standardized GAOKAO EXAM RESULTS
  
  *Creating standardized variables (By track)
   egen stgaok1sci = std(Total_mark) if Exam==99 & year==2015  & Division==1
		egen stgaok2sci = std(Total_mark) if Exam==99 & year==2016 & Division==1
	egen stgaok3sci = std(Total_mark) if Exam==99 & year==2017 & Division==1

	gen stgaok = stgaok1sci
			replace stgaok = stgaok2sci if year==2016 & Division==1
	replace stgaok = stgaok3sci if year==2017  & Division==1
	
	*Arts
	  egen stgaok1art = std(Total_mark) if Exam==99 & year==2015  & Division==0
		egen stgaok2art = std(Total_mark) if Exam==99 & year==2016 & Division==0
	egen stgaok3art = std(Total_mark) if Exam==99 & year==2017 & Division==0

	replace stgaok = stgaok1art if year==2015 & Division==0
		replace stgaok = stgaok2art  if year==2016  &  Division==0
	replace stgaok = stgaok3art if year==2017  & Division==0
	

	** Generate Peeer Quality
    egen stgaok1111 = std(Total_mark) if Exam==0 & year==2015
		egen stgaok1211 = std(Total_mark) if Exam==0 & year==2016
	egen stgaok1311 = std(Total_mark) if Exam==0 & year==2017
	gen stanHS = stgaok1111
			replace stanHS = stgaok1211 if year==2016
	replace stanHS = stgaok1311 if year==2017
		
		** Generate Total first year scores
		
		*** Standardized total first year test scores 
	bys newID: egen firstyearscore = mean(Total_mark) if Exam==12 | Exam==13 | Exam==14 | Exam==15   | Exam==16 | Exam==17 
	
		*Standardized by year**
		egen st1 = std(firstyearscore) if year==2015  
		egen st2 = std(firstyearscore) if year==2016 
		egen st3 = std(firstyearscore) if year==2017 
		
		gen st = st1 
		replace st = st2 if year==2016
		replace st = st3 if year==2017
	
	**Generate Likelihood of going to any Chinese college
						*checking if college outcome missing
		bys newID: egen maxcoll1 = mean(if_FirstBatch) 
				gen College1 = 1 if Exam==99 
						replace College1 = 0 if Exam==99 & maxcoll==.
						
						
			** Generate First tier Univisty attendance (widest definition)
gen firsttieradmit = 1 if if_FirstBatch==1
replace firsttieradmit = 0 if if_FirstBatch!= 1

**Generate Top 100 (211 project) Univisty attendance (2nd widest definition)
*Fixing errors with 211 schools (those who didnt enter first_tier). This is instnce where someone is in top 100 but not intop tier (which shouldnt be possible)

replace if_211 = 0 if newID == 1100736 & Exam==99
replace if_211 = 0 if newID == 62496 & Exam==99
replace if_211 = 0 if newID == 429408 & Exam==99
	
gen top100admit = 1 if if_211==1
replace top100admit = 0 if if_211!=1


	**3) Generate Top 40 (985 project) Univisty attendance (3rd widest definition)
		*Fixing errors with 985 schools (those who didnt enter first_tier). This is instnce where someone is in top 40 but not in something less ranked (which shouldnt be possible)
		replace if_985 =0 if if_985==1 & if_211==0
		
gen top40admit = 1 if if_985==1
replace top40admit = 0 if if_985!=1 			
						
	
*Generate average teacher quality in year 1 and assign it to Exam ==99

bys class year: egen meanfirstyearclassteach= mean(teacherrank) if  Exam==12 | Exam==13 | Exam==14 | Exam==15 | Exam==16 | Exam==17 


bys newID: egen teachfirstyearqual = mean(meanfirstyearclassteach)


**Proportion TOP TEACHERS in year*

bys class year: egen meanfirstyeartopteach= mean(topteach) if  Exam==12 


bys newID: egen teachfirstyeartopteach = mean(meanfirstyeartopteach)



	
	
	
	*********************ANALYSIS BEGINS HERE****************************
	
	
	**First year teacher analyis** Use the below analysis to determine if top teachers are in top track or not*


**Check propoprtion of teacher quality continuous
bys class: sum teachfirstyearqual  if year==2015 & Exam==12
bys class: sum teachfirstyearqual  if year==2016 & Exam==12
bys class: sum teachfirstyearqual if year==2017 & Exam==12

**Check proportion of top teachers only

bys class: sum teachfirstyeartopteach  if year==2015 & Exam==12
bys class: sum teachfirstyeartopteach  if year==2016 & Exam==12
bys class: sum teachfirstyeartopteach if year==2017 & Exam==12



*2015 cohort
gen samplehigh1new = 1 if year==2015 & (class==13 | class==14 | class==1 |class==3 ) 
gen samplelow1new  = 1 if year==2015 & samplehigh1new!=1 
replace samplelow1new  = 1 if year==2015 & (class==13 | class==14)

*2016 cohort
replace samplehigh1new  =1 if year==2016 & (class==1 | class==2 | class==4  )
replace samplelow1new  = 1 if year==2016 & samplehigh1new!=1 
replace samplelow1new  = 1 if year==2016 & (class==1 | class==2) 		
		
*2017	 cohort	
replace samplehigh1new  =1 if year==2017 & (class==1 | class==2 | class==14 | class==9 |  class==12  ) 
replace samplelow1new  = 1 if year==2017 & samplehigh1new!=1 
replace samplelow1new  = 1 if year==2017 & (class==1 | class==2) 	
		
	
	cd "C:\Users\pmougani\Dropbox\peng&serena&pierre\Paper"

	
	
exit






						*****************************************************************************************************************************************
					    ********************************************************Table 5---Mechanisms***************************************************************
						*****************************************************************************************************************************************





**START BY RUNNING SAMPLE FOR BOTH HIGH QUALITY TEACHERS
***PANEL A******

keep if samplehigh1new==1 

	
		**First year class grades math (need to prob change treatment here*
	
rdrobust stmath  normrunning if Exam==12 & hsscore!=. , covs(sex years1 years2 hsscore urban) bwselect ( msetwo )
	estimates store reg1

	rdrobust stmath  normrunning if Exam==12 & hsscore!=. , covs(sex years1 years2 hsscore urban)	kernel(uniform)  h ( 43.747      27.428)
		estimates store reg2	
	
***COLLEGE ENTRANCE EXAM SCORES***

rdrobust stgaok  normrunning if Exam==99 & hsscore!=.  , covs(sex years1 years2 hsscore urban) bwselect ( msetwo )
	estimates store reg3

	rdrobust stgaok  normrunning if Exam==99 & hsscore!=.  , covs(sex years1 years2 hsscore urban)	kernel(uniform)   h ( 40.400      24.715)
		estimates store reg4
		
		
		
		
				**Top 100 college attendance (wide definion)
		
		
				rdrobust top100admit   normrunning if Exam==99 & hsscore!=. , covs(sex years1 years2 hsscore urban) bwselect ( msetwo )
	estimates store reg5

	rdrobust top100admit   normrunning if Exam==99 & hsscore!=., covs(sex years1 years2 hsscore urban)	kernel(uniform) h ( 36.371      22.138)
		estimates store reg6
		
		
				
				**Top 40 college attendance (wide definion)
		
		
					
rdrobust top40admit   normrunning if Exam==99 & hsscore!=. , covs(sex years1 years2 hsscore urban) bwselect ( msetwo )
	estimates store reg7

	rdrobust top40admit   normrunning if Exam==99 & hsscore!=., covs(sex years1 years2 hsscore urban)	kernel(uniform)  h ( 41.021      20.669)
		estimates store reg8
		
		
		
		
		
			estout reg* using test.tex, replace cells(b(fmt(%9.3f) star label(Coef.)) se(fmt(%9.3f) par label(Std)))  starlevels(* 0.1 ** 0.05 *** 0.01) stats(N_b_l N_b_r, fmt(%9.0f) label(Eff. Number of obs)) style(tex)
				
				estimates drop reg*
		

  
	
	
	

	
	
	
	
	
	
	**Rerun whole analysis for different sample where teachers are low quality**
	
***PANEL B******
	
	
keep if samplelow1new==1 
	
		
		
		**First year class grades math*
	
rdrobust stmath  normrunning if Exam==12 & hsscore!=. , covs(sex years1 years2 hsscore urban) bwselect ( msetwo )
	estimates store reg1

	rdrobust stmath  normrunning if Exam==12 & hsscore!=., covs(sex years1 years2 hsscore urban)	kernel(uniform)  h ( 32.226      17.373)
		estimates store reg2	
	
***COLLEGE ENTRANCE EXAM SCORES***

	
rdrobust stgaok  normrunning if Exam==99 & hsscore!=.  , covs(sex years1 years2 hsscore urban) bwselect ( msetwo )
	estimates store reg3

	rdrobust stgaok  normrunning if Exam==99 & hsscore!=.  , covs(sex years1 years2 hsscore urban)	kernel(uniform)  h ( 60.481      22.680)
		estimates store reg4
		
	
		
				**Top 100 college attendance (wide definion)
				
	
rdrobust top100admit   normrunning if Exam==99 & hsscore!=. , covs(sex years1 years2 hsscore urban) bwselect ( msetwo )
	estimates store reg5

	rdrobust top100admit   normrunning if Exam==99 & hsscore!=., covs(sex years1 years2 hsscore urban)	kernel(uniform)  h(54.326      23.118)
		estimates store reg6
		
		
				
				**Top 40 college attendance (wide definion)
		
		
					
rdrobust top40admit   normrunning if Exam==99 & hsscore!=. , covs(sex years1 years2 hsscore urban) bwselect ( msetwo )
	estimates store reg7

	rdrobust top40admit   normrunning if Exam==99 & hsscore!=., covs(sex years1 years2 hsscore urban)	kernel(uniform)  h (47.588      19.902)
		estimates store reg8

			estout reg* using test2.tex, replace cells(b(fmt(%9.3f) star label(Coef.)) se(fmt(%9.3f) par label(Std)))  starlevels(* 0.1 ** 0.05 *** 0.01) stats(N_b_l N_b_r, fmt(%9.0f) label(Eff. Number of obs)) style(tex)
				
				estimates drop reg*
		
		
		
	